DWS-AQA: A Cost Effective Approach for Very Large Data Warehouses
نویسندگان
چکیده
Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the users. Large data warehouses require a very expensive setup, typically based on high-end servers or high-performance clusters. In this paper we propose and evaluate a simple but very effective method to implement a data warehouse using the computers and workstations typically available in large organizations. The proposed approach is called data warehouse striping with approximate query answering (DWS-AQA). The goal is to use the processing and disk capacity normally available in large workstation networks to implement a data warehouse with a very reduced infrastructure cost. As the data warehouse shares computers that are also being used for other purposes, most of the times only a fraction of the computers will be able to execute the partial queries in time. However, as we show in the paper, the approximated answers estimated from partial results have a very small error for most of the plausible scenarios. Moreover, as the data warehouse facts are partitioned in a strict uniform way, it is possible to calculate tight confidence intervals for the approximated answers, providing the user with a measure of the accuracy of the query results. A set of experiments on the TPC-H benchmark database is presented to show the accuracy of DWS-AQA for a large number of
منابع مشابه
A middle layer for distributed data warehouses using the DWS-AQA technique
The DWS (Data Warehouse Striping) technique is a round-robin data partitioning approach especially designed for distributed data warehouse environments. In DWS the fact tables are distributed by an arbitrary number of computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. This technique is combined with an approximate query a...
متن کاملEfficient Data Distribution for DWS
The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy pr...
متن کاملScalable Maintenance of Multiple Interrelated Data Warehousing Systems
The maintenance of data warehouses(DWs) is becoming an increasingly important topic due to the growing use, derivation and integration of digital information. Most previous work has dealt with one centralized data warehouse only. In this paper, we now focus on environments with multiple DWs that are possibly derived from other DWs. In such a large-scale environment, data updates from base sourc...
متن کاملRequirements Engineering for Data Warehouses
Data Warehouses (DWs) aim at supporting the decision-making process of an organization. In the Requirements Engineering (RE) domain, several methods were proposed for the development of DWs, most of them based on the Goal-Oriented Requirements Engineering (GORE) approach. However, there is not yet a comprehensive and unified perspective of the various methods proposed. In this paper, a coherent...
متن کاملData Warehouse Striping: Improved Query Response Time
The increasing use of decision support systems led to an explosion in the amount of business information that must be managed by the data warehouses. Therefore, data warehouses must have efficient Online Analytical Processing (OLAP) that provides tools to satisfy the information needs of business managers, helping them to make faster and more effective decisions. Improving query response time i...
متن کامل